ohi logo
Coastal barometer blog | Coastal barometer website |

Here I will run the exploration of the response variables (aquaculture production and fisheries production) and of the covariates

library(vroom)
library(here)
library(lattice)
library(cowplot)

1 Load the data

seafood <-read.csv(here("prep", "output_data", "spfood_allvars.csv"))

Calculate total sea food production here as well. I also create a log-transformed fisheries catch, because it is highly skewed to the left (see histograms below)

seafood_prep <- seafood |> 
  rowwise() |> 
  mutate(totprod = sum(aqua_prod_ton, total_catch_ton, na.rm=T)) |> 
  mutate(aqua_prop = aqua_prod_ton/totprod) |> 
  mutate(fish_catch_log = log(total_catch_ton + 1)) |> 
  mutate(aqua_prod_log = log(aqua_prod_ton + 1))

2 Data exploration

I will use extensively the approaches and functions from the two Highland statistics ltd books: * Mixed effects models and extensions in ecology with R. (2009). Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer

and

Cleveland dotplots of the variables

Mydotplot(seafood_prep[,c(7:14)])

We observe an excess of zeros or close-to-zero values in the population, population growth, unemployed, and total catch variables.

Check the % of zeroes in the variables:

zeros_prop <- function(x){ #x is a numerical vector
  rows = sum(!is.na(x)) 
  zeros = sum(x == 0, na.rm = T)
  prop = zeros/rows
  prop
}
myvars <- seafood_prep[,c("aqua_prod_ton",
                          "total_catch_ton",
                          "distance",
                          "population",
                          "popgrowth",
                          "sea_area",
                          "percent_wforce",
                          "unemployed"
                          )]
map(myvars, ~zeros_prop(.x))
## $aqua_prod_ton
## [1] 0.00119
## 
## $total_catch_ton
## [1] 0.154
## 
## $distance
## [1] 0
## 
## $population
## [1] 0.00928
## 
## $popgrowth
## [1] 0.0441
## 
## $sea_area
## [1] 0
## 
## $percent_wforce
## [1] 0.00812
## 
## $unemployed
## [1] 0

Only fisheries has 15% of zeros, other variables do not suffer from zero inflation

Check out the distribution of variables:

varnames <- list("Aquaculture", 
                  "Fisheries", 
                 "Distance South to North",
                 "Population", 
                 "Population growth",
                 "Sea area",
                  "Percent in workforce",
                 "Number of unemployed"  )


par(mfrow = c(2,4))
hist <- map2(myvars,varnames, ~ hist(.x, main = .y, col = "seagreen", xlab = " ", ylab = "", cex.main=1.1))

2.1 Historgrams of the 59 selected municipalities

For the mixed models, I will use only 59 municipalities, that is, 817 out of 862 observations (that had at least 10 years of data). Let’s see if historgamrs are differenct for these observations.

mcp59 <- read_csv(here::here("prep", "output_data", "mcp_59selected.csv"))

myvars59 <- mcp59[,c("aqua_prod_ton",
                          "total_catch_ton",
                          "distance",
                          "population",
                          "popgrowth",
                          "sea_area",
                        "percent_wforce",
                          "unemployed")]


par(mfrow = c(2,4))
hist <- map2(myvars59,varnames, ~ hist(.x, main = .y, col = "seagreen", xlab = " ", ylab = ""))

Most of the variables are left-skewed, but fisheries may need to be transformed, or later, the response variable aquaculture/(aquaculture + fisheries) may need to be transformed.

The boxplots of the variables, to check if there are outliers (formally): yes, since the variables are all left-skewed

par(mfrow = c(2,4))
map2(myvars,varnames, ~ boxplot(.x, main = .y, col = "seagreen", xlab = " "))

Let’s take a look at the variable aqua_prop which is a proportion of aquaculture in the total seafood production and at the log of the fisheries catches

par(mfrow=c(1,3))
hist(seafood_prep$aqua_prop, main = "Aquaculture proportion", xlab = "", col = "orchid")
hist(seafood_prep$fish_catch_log, main = "Log of fisheries catches", xlab = "", col = "orchid")
hist(seafood_prep$aqua_prod_log, main = "Log of aquaculture production", xlab = "", col = "orchid")

3 Collinearity

Mypairs(myvars59)

Clear association is only seen for the population and growth, and between the number of unemployed and population growth (positive relationship in both cases).

4 Responses versus covariates

4.1 Aquaculture proportion versus covariates

MyX <- c("year", "sea_area", "population", "popgrowth" , "unemployed", "distance", "percent_wforce")
MyMultipanel.ggp2(Z = seafood_prep,
                  varx = MyX,
                  vary = "aqua_prop",
                  ylab = "aquaculture % in seafood production",
                  addSmoother = TRUE,
                  addRegressionLine = FALSE,
                  addHorizontalLine = FALSE) 
## Warning: Removed 210 rows containing non-finite values (stat_smooth).
## Warning: Removed 210 rows containing missing values (geom_point).

No clear associations or patterns, but i need to consider transformation of a response variable.

4.2 Aquaculture production versus covariates

What if aquaculture production alone would be used as a response variable? Let’s see also if fisheries catch has anything to do with aquaculture produciton

MyMultipanel.ggp2(Z = seafood_prep,
                  varx = MyX,
                  vary = "aqua_prod_ton",
                  ylab = "aquaculture production",
                  addSmoother = TRUE,
                  addRegressionLine = FALSE,
                  addHorizontalLine = FALSE) 
## Warning: Removed 210 rows containing non-finite values (stat_smooth).
## Warning: Removed 210 rows containing missing values (geom_point).

This looks more interesting! Relationships are close to linear, but this result include all municipalities, it might be different per municipality.

4.3 Fisheries catch versus aquaculture

Might be interesting to examine their associations statistically

fish_aqua <-ggplot(
  data = seafood_prep,
  mapping = aes(x = total_catch_ton, y = aqua_prod_ton)) +
  geom_point() +
  geom_smooth()

fish_aqua
## Warning: Removed 25 rows containing non-finite values (stat_smooth).
## Warning: Removed 25 rows containing missing values (geom_point).

4.4 Fisheries production versus covariates

MyMultipanel.ggp2(Z = seafood_prep,
                  varx = MyX,
                  vary = "total_catch_ton",
                  ylab = "fisheries catch",
                  addSmoother = TRUE,
                  addRegressionLine = FALSE,
                  addHorizontalLine = FALSE) 
## Warning: Removed 35 rows containing non-finite values (stat_smooth).
## Warning: Removed 35 rows containing missing values (geom_point).

I maybe will need to transform fisheries catch to log(catch) for the analysis

5 Checking the distribution of responce variables per municipality

Just to see if the distribution of fisheries and aquaculture production over years look very different when plotted per municipality. There is a clear increase in the production over years!

ggplot(data = seafood_prep) +
  geom_point(mapping = aes(x = year, y =  aqua_prod_ton), size = 0.5) +
  geom_smooth(mapping = aes(x = year, y =  aqua_prod_ton), size = 0.4) +
  facet_wrap( municip ~ .) +
  theme(
    axis.text.x = element_text(size = 10, angle = 90),
    legend.position = "none",
    panel.background = element_rect(fill = "white")) 

Same but for fisheries

ggplot(data = seafood_prep) +
  geom_point(mapping = aes(x = year, y =  total_catch_ton), size = 0.5) +
  geom_smooth(mapping = aes(x = year, y =  total_catch_ton), size = 0.4) +
  facet_wrap( municip ~ .) +
  theme(
    axis.text.x = element_text(size = 10, angle = 90),
    legend.position = "none",
    panel.background = element_rect(fill = "white"))

6 Verifying dependence in the response variables

I am already quite sure that both responses will be temporally and spatially dependent. But to test for that formally, we can use an autocorrelation test:

seafood_temp <- arrange(seafood_prep, year) 
  acf(seafood_temp$aqua_prod_ton, na.action = na.pass)

acf(seafood_temp$total_catch_ton)